Statistical synthesizer with embedded prosodic and spectral modifications to generate highly intelligible speech in noise

نویسندگان

  • Daniel Erro
  • Tudor-Catalin Zorila
  • Yannis Stylianou
  • Eva Navas
  • Inma Hernáez
چکیده

This paper describes a statistical parametric speech synthesizer that, despite having been trained on an ordinary synthesis database and without any adaptation data, is able to generate highly intelligible speech in noisy environments. By using a simple and flexible vocoder based on a harmonic model, it applies several noise-independent modifications to durations, pitch level and range, energy contour, formant sharpness, and intensity of particular spectral bands. The system has been evaluated by means of a large subjective test, the results of which show that the suggested approach clearly outperforms the reference TTS systems and even unmodified natural speech in some conditions.

منابع مشابه

Toshiba English text-to-speech synthesizer (TESS)

Toshiba English Text-to-Speech Synthesizer utilizes several new techniques to produce synthesized speech that is more natural-sounding and intelligible than that created by conventional synthesizers. The closed-loop training method creates synthesis units that most closely resemble the training data and are the least susceptible to prosodic distortion noise by analytically solving an equation t...

متن کامل

The Blizzard Challenge 2006 CMU Entry introducing hybrid trajectory-selection synthesis

Acknowledging the lessons of Blizzard Challenge 2005 – that smooth prosodic cadence supersedes spectral resolution – but wanting a system devoid of vocoding artifacts – we introduce a hybrid trajectory-selection synthesizer. Using a parametric synthesizer to generate a pitch-synchronous sequence of F0/duration/power and spectral vectors, this trajectory serves as the target cost function for a ...

متن کامل

A New Model-Based Mandarin-Speech Coding System

In this paper, a new model-based Mandarin-speech coding system is proposed. It employs a prosody-enriched ASR with a hierarchical prosodic model (HPM) to generate from the input speech enriched transcriptions, including linguistic features, prosodic tags and spectral parameters in the encoder. By sending these features to the decoder, we can first reconstruct the prosodic-acoustic features of s...

متن کامل

Increasing speech intelligibility via spectral shaping with frequency warping and dynamic range compression plus transient enhancement

In order to make speech (natural or synthetic) more intelligible for listeners in real-world noisy environments, various modifications have been proposed that exploit spectral and temporal signal features. Previously, an evaluation campaign involving several approaches illustrated that a Spectral Shaping (SS) and Dynamic Range Compression (DRC) method proved highly successful at increasing spee...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013